Search CORE

9 research outputs found

Correlating twitter language with community-level health outcomes

Author: Cieliebak Mark
Grubenmann Ralf
Jaggi Martin
Logean Séverine Rion
Schneuwly Arno
Publication venue: ZHAW Zürcher Hochschule für Angewandte Wissenschaften
Publication date: 01/01/2019
Field of study

We study how language on social media is linked to diseases such as atherosclerotic heart disease (AHD), diabetes and various types of cancer. Our proposed model leverages state-of-the-art sentence embeddings, followed by a regression model and clustering, without the need of additional labelled data. It allows to predict community-level medical outcomes from language, and thereby potentially translate these to the individual level. The method is applicable to a wide range of target variables and allows us to discover known and potentially novel correlations of medical outcomes with life-style aspects and other socioeconomic risk factors

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

ZHAW digitalcollection

Twist Bytes : German dialect identification with data mining optimization

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Deriu Jan Milan
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: VarDial
Publication date: 01/01/2018
Field of study

We describe our approaches used in the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2018. The goal was to identify to which out of four dialects spoken in German speaking part of Switzerland a sentence belonged to. We adopted two different metaclassifier approaches and used some data mining insights to improve the preprocessing and the meta-classifier parameters. Especially, we focused on using different feature extraction methods and how to combine them, since they influenced the performance very differently of the system. Our system achieved second place out of 8 teams, with a macro averaged F-1 of 64.6%. We also participated on the surprise dialect task with a multi-label approach

ZHAW digitalcollection

Word unigram weighing for author profiling at PAN 2018 : notebook for PAN at CLEF 2018

Author: Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
Publication venue
Publication date: 01/01/2018
Field of study

ZHAW digitalcollection

Correlating Twitter Language with Community-Level Health Outcomes

Author: Cieliebak Mark
Grubenmann Ralf
Jaggi Martin
Logean Severine Rion
Schneuwly Arno
Publication venue: Stroudsburg, ASSOC COMPUTATIONAL LINGUISTICS-ACL
Publication date: 21/06/2020
Field of study

Infoscience - École polytechnique fédérale de Lausanne

spMMMP at GermEval 2018 Shared Task: Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units

Author: Benites Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: oeaw
Publication date: 02/10/2018
Field of study

In this paper, we propose two different systems for classifying offensive language in micro-blog messages from Twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

Elektronisches Publikationsportal der Ãsterreichischen Akademie der Wissenschaften

spMMMP at GermEval 2018 Shared Task: Classification of Offensive Content in Tweets using Convolutional Neural Networks and Gated Recurrent Units

Author: Benites Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: oeaw
Publication date: 02/10/2018
Field of study

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften

SB-CH : a Swiss German corpus with sentiment annotations

Author: Cieliebak Mark
Deriu Jan Milan
Grubenmann Ralf
Tuggener Don
von Däniken Pius
Publication venue: European Language Resources Association
Publication date: 01/01/2018
Field of study

ZHAW digitalcollection

spMMMP at GermEval 2018 shared task : classification of offensive content in tweets using convolutional neural networks and gated recurrent units

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Grubenmann Ralf
von Däniken Pius
von Grünigen Dirk
Publication venue: ÖAW Austrian Academy of Sciences
Publication date: 01/01/2018
Field of study

In this paper, we propose two different systems for classifying offensive language in micro-blog messages from twitter (”tweet”). The first system uses an ensemble of convolutional neural networks (CNN), whose outputs are then fed to a meta-classifier for the final prediction. The second system uses a combination of a CNN and a gated recurrent unit (GRU) together with a transfer-learning approach based on pretraining with a large, automatically translated dataset

Elektronisches Publikationsportal der Ãsterreichischen Akademie der Wissenschaften

ZHAW digitalcollection

Elektronisches Publikationsportal der Österreichischen Akademie der Wissenschaften